Composing Tree Graphical Models with Persistent Homology Features for Clustering Mixed-Type Data

نویسندگان

  • Xiuyan Ni
  • Novi Quadrianto
  • Yusu Wang
  • Chao Chen
چکیده

Clustering data with both continuous and discrete attributes is a challenging task. Existing methods often lack a principled probabilistic formulation. In this paper, we propose a clustering method based on a tree-structured graphical model to describe the generation process of mixed-type data. Our tree-structured model factorizes into a product of pairwise interactions, and thus localizes the interaction between feature variables of different types. To provide a robust clustering method based on the tree-model, we adopt a topographical view and compute peaks of the density function and their attractive basins for clustering. Furthermore, we leverage the theory from topology data analysis to adaptively merge trivial peaks into large ones in order to achieve meaningful clusterings. Our method outperforms state-of-the-art methods on mixed-type data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

Nucleos: a web server for the identification of nucleotide-binding sites in protein structures

Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleot...

متن کامل

Introduction to the R package TDA

We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kerne...

متن کامل

Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest

This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017